

\begin{figure}[t]
\centering
%\epsfig{file=fly.eps}
\includegraphics[width=2.5in]{images/kc-before-split.eps} %2.3
\caption{A learning curve that has increasing error rate with increasing practice opportunities \label{fig:before_split}}
\end{figure}


\begin{figure}[t]
%\centering
%\left
\begin{subfigure}[t]{1.5in}
%\epsfig{file=fly.eps, height=1in, width=1in}
\includegraphics[width=1.8in]{images/kc-split-1.eps}
\caption{Automatically discovered easy skill.}
\end{subfigure}
\begin{subfigure}[t]{1.5in}
%\centering
%\epsfig{file=fly.eps, height=1in, width=1in}
\includegraphics[width=1.8in]{images/kc-split-2.eps}
\vspace{-10pt}
\caption{Automatically discovered difficult skill.}
\end{subfigure}
\caption{\label{fig:example} Output learning curves of applying \methodname on the skill of Figure~\ref{fig:before_split}.  \methodname splits the skill by difficulty.
The learning curves of the discovered skills show decreasing error rate with increasing practice opportunities.
 \yh{Thicker lines, larger font, remove boxes}}
 \label{fig:after_split}
\end{figure}



\yh{check}
%\yh{1 paragraph of dataset; 1 paragraph of evaluation metric; 1 paragraph of result}
%Figure~\ref{fig:example} shows examples of learning curves.  \yh{explain learning curves axes and interpretation}

Our dataset is collected from a Math commercial tutor for grade 7 students using the system from 2011 to 2013.  There are about 27,010,721 homework item responses and 7,633,771 test item responses. Experts labeled 887 skills in total. Each item only maps to one skill, while one skill can map to $1\sim7$ items. We only take into account the first attempt in each item of a homework. As mentioned before, we conduct learning curve analysis on homework data and estimate IRT using test data.

As mentioned in Section~\ref{sec:method}, we use Spearman rank-order correlation to evaluate the quality of skills. 
A skill with non significant negative rank correlation is identified as an ill-defined skill. We define the average monotonicity across skills as the quality measurement of the skills by computing the average Spearman rank-order correlation across skills:
\[
 \text{average monotonicity}  = \sum_k \frac{\text{rank\_correlation(skill $k$)}}{\text{\# of skills}}
\]

We evaluate the original expert defined 887 skills using the rank correlation and identified 477 skills with little or no learning based on the homework data. 
Among these 477 skills, there are 85 skills of which all the items are overlapping with test data and have item difficulty estimation. 
In this study, we first focus on refining these 85 ill-defined skills. 
There are 257 distinct items mapped to these 85 ill-defined skills.
The average monotonicity of these 85 skills are around 0.10. 

After applying \methodname, there are 72 skills splitted into two new skills, resulting in total 157 skills (also with not splitted original skills). 
The new average monotonicity of the 157 skills is around -0.01, which is better than that of the original skill definition.
Among the 72 splitted skills, 28 of them turns into two new skills all of which have good learning curves. 
These 28 new good skills contains 89 distinct items
Thus, \methodname improves about 35\% (89/257) of the items that originally mapped  to expert ill-defined skills. We are aware that \methodname takes the simplification that the a non-decreasing learning curve is due to expert's ill  definition, yet other reasons could exist (e.g. students not having enough practices). We need to further investigate the reasons of the non-decreasing learning curves.

Figure~\ref{fig:after_split} shows an example of the learning curves of new skills after applying \methodname on the skill of Figure~\ref{fig:before_split}. \methodname discovered new skills show decreasing trend of learning curves.  The x-axis represents the practice opportunity counts and the y-axis represents the error rate across students at each practice. The size of the point represents the log scale (base 10) of the number of observations at that point. The black line is the learning curve, and the grey line is the regression line fitted for the student response data in order to visually present the increase or decrease trend. We should note that at latter practices the number of observations decrease significantly, so the trend of the curve is mainly determined by the earlier points where there are larger sample sizes. 

%\Blindtext[2][1]
\begin{comment}
\begin{figure}
\centering
%\epsfig{file=fly.eps, height=1in, width=1in}
\includegraphics[width=3.7in]{images/kc-split-1.eps}
\caption{Automatically discovered easy skill applying \methodname on the skill of Figure~\ref{fig:before_split}.  The learning curve shows decreasing trend with increasing practice opportunities.}
 \label{fig:after_split1}
\end{figure}

\begin{figure}
\centering
%\epsfig{file=fly.eps, height=1in, width=1in}
\includegraphics[width=3.7in]{images/kc-split-2.eps}
\caption{Automatically discovered difficult skill applying \methodname on the skill of Figure~\ref{fig:before_split}.  The learning curve shows decreasing trend with increasing practice opportunities.}
 \label{fig:after_split2}
\end{figure}
\end{comment}


\begin{comment}
\begin{figure*}[t]
\begin{subfigure}[t]{2.3in}
%\epsfig{file=images/a-good-lc.eps, height=1in, width=1in}
\includegraphics[width=2.3in]{images/a-good-lc.eps}
\caption{A decreasing error (good) learning curve \jpg{This is not the best example. As we talked in the data story, this has a kink..} }
\end{subfigure}
\begin{subfigure}[t]{2.3in}
\centering
%\epsfig{file=fly.eps}
\includegraphics[width=2.3in]{images/a-bad-lc.eps}
\caption{A flat  learning curve}
\end{subfigure}
\begin{subfigure}[t]{2.3in}
\centering
%\epsfig{file=fly.eps}
\includegraphics[width=2.3in]{images/kc-before-split.eps}
\caption{An increasing error (bad) learning curve}
\end{subfigure}
\caption{\label{fig:lc} Examples of learning curves \jpg{put your ``quality metric" in the caption, and remove the box from the graph (can't read it anyway)}}
\end{figure*}
\end{comment}



